This document contains some overall description of the various declassified satellite imagery datasets that I’ve been playing around with for the last few months. They were downloaded from https://earthexplorer.usgs.gov/.

Descriptive Statistics

Total Images

The dataset consists of 837,088 images taken between 1960 and 1984 by 5 different satellite systems (see Appendix A for more information about the different datasets).

The following plot shows the temporal distribution of photos, broken down by data source.

n_pics_per_year_grouped <- sat %>% mutate(`Year` = as.numeric(substr(`Acquisition Date`,1,4))) %>%
  group_by(Year, `Data Source`) %>%
  summarise(n_pics = n())
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
n_pics_per_year_grouped %>%
  ggplot(aes(x = Year)) +
  geom_area(aes(y = n_pics, fill = `Data Source`), 
              alpha = 0.9) + 
  scale_fill_manual(values = c("declass1" = "#7eb0d5", "declass2" = "#fd7f6f", "declass3" = "#01A66F")) +
  ylab("Number of Pictures")
## Warning: Removed 1 rows containing non-finite values (`stat_align()`).

Average Image Footprint

It’s worth mentioning that the types of images available vary widely both within and across datasets. To take just one dimension of variation, here is the average footprint of photos for each dataset:

sat_sf <- st_as_sf(sat, wkt = "geometry")
sat_sf$area <- st_area(sat_sf)

sat_sf_trimmed <- sat_sf %>%
  filter(area != 0)

avgs <- sat_sf_trimmed %>%
  st_drop_geometry() %>%
  as.data.frame() %>%
  group_by(`Data Source`) %>%
  summarise(avg_area = mean(area))

# Calculate the average area of the geometries
mean(sat_sf_trimmed$area) # ~3.22, or about 69*69*3.22 = 15330 sq miles
## [1] 3.223478
avgs %>%
  ggplot(aes(x = `Data Source`, y = avg_area)) +
  geom_col(fill = c("declass1" = "#7eb0d5", "declass2" = "#fd7f6f", "declass3" = "#01A66F")) +
  ylab("Average Area (in lat/lng units)")

Many of the frames–especially from the earlier programs in declass1–have truly massive footprints. The unit is “square lat/long points,” which is a bit of an imprecise unit, but which I think corresponds to roughly 69*69 = ~ 4800 square miles. The overall average size for an image frame in the dataset is thus about 15,000 square miles.

library(shiny)

# Define UI for the Shiny app
ui <- fluidPage(
  actionButton("sampleButton", "Re-sample"),
  leafletOutput("map")
)

generate_sample <- function(n) {
  return(sample_n(sat_sf_trimmed, n))
}


# Define server logic
server <- function(input, output) {
  # Initial random sample
  sampled_data <- reactive(generate_sample(5))
  
  # Update the sampled data when the button is clicked
  observeEvent(input$sampleButton, {
    
    output$map <- renderLeaflet({
      leaflet() %>%
        addTiles() %>%
        addPolygons(data = generate_sample(5))
    })
  })
  
  # Render the leaflet map
  output$map <- renderLeaflet({
    leaflet() %>%
      addTiles() %>%
      addPolygons(data = generate_sample(5))
  })
}

# Run the Shiny app
shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents
typical_picture <- sat_sf_trimmed %>% 
  sample_n(25)

str(typical_picture[1])
## sf [25 × 2] (S3: sf/tbl_df/tbl/data.frame)
##  $ Entity ID: chr [1:25] "DS1101-1037DA014" "DS1052-1024DA072" "DS1029-1006DF071" "DS1110-1121DA008" ...
##  $ geometry :sfc_POLYGON of length 25; first list element: List of 1
##   ..$ : num [1:5, 1:2] 115 119 119 115 115 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
##   ..- attr(*, "names")= chr "Entity ID"
leaflet() %>%
  addTiles() %>%
  addPolygons(data = typical_picture)

Scratch

Here’s a randomly sample of images with roughly that footprint:

typical_picture <- sat_sf_trimmed %>% 
  filter(area < 3.23, area > 3.21) %>%
  sample_frac(0.1)

str(typical_picture[1])
## sf [59 × 2] (S3: sf/tbl_df/tbl/data.frame)
##  $ Entity ID: chr [1:59] "DS009043007DA043" "DZB1212-500098L003001" "D3C1211-400692F001" "D3C1203-100065A040" ...
##  $ geometry :sfc_POLYGON of length 59; first list element: List of 1
##   ..$ : num [1:5, 1:2] 56.4 66.1 66.2 56.9 56.4 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA
##   ..- attr(*, "names")= chr "Entity ID"
leaflet() %>%
  addTiles() %>%
  addPolygons(data = typical_picture)

Calculate and Plot Facility Coverage

# fac_caps <- fac_caps_orig
# 
# # Build coverage DF
# coverage <- data.frame(year = integer(),
#                        num_extant_facilities = integer(),
#                        num_spotted_facilities = integer(),
#                        coverage = double())
# 
# spotted <- c()
# 
# # loop thru years
# for (year in as.integer(format(min_date, "%Y")):as.integer(format(max_date, "%Y"))) {
#   year_date <- as.Date(paste(year, "-01-01", sep = ""))
#   
#   # extant facilities
#   fac_exist <- facs %>%
#     filter(start_date <= year_date) %>%
#     distinct(facility_name)  # Get unique facility names
#   
#   n_fac_exist <- nrow(fac_exist)
#   
#   # spotted facilities
#   fac_spotted <- fac_caps %>%
#     filter(`Acquisition Date` <= year_date) %>%
#     distinct(facility_name)  # Get unique facility names
#   
#   spotted <- unique(append(spotted,fac_spotted$facility_name))
#   n_fac_spotted <- length(spotted)
#   
#   # remove earlier facilities for next loop
#   fac_caps <- fac_caps %>% 
#     filter(`Acquisition Date` > year_date)
#   
#   coverage <- bind_rows(coverage, data.frame(year = year,
#                                              num_extant_facilities = n_fac_exist,
#                                              num_spotted_facilities = n_fac_spotted,
#                                              coverage = n_fac_spotted / n_fac_exist))
# }
# 
# ## Plot
# ggplot(coverage, aes(x = year)) +
#   geom_ribbon(aes(ymin = 0, ymax = num_extant_facilities, fill = "Extant Facilities"), alpha = 0.75) +
#   geom_ribbon(aes(ymin = 0, ymax = num_spotted_facilities, fill = "Spotted Facilities"), alpha = 0.75) +
#   scale_fill_manual(values = c("Extant Facilities" = "#7eb0d5", "Spotted Facilities" = "#fd7f6f")) +
#   labs(y = "Coverage", 
#        x = "Year",
#        fill = "") +
#   theme_minimal()

Appendix A: Satellite Imagery Dataset Overview

Declassified 1

Declassified 1 is the product of a blanket declassification in 1995 and purportedly represents all of the images from the following satellite programs:

  • CORONA: 1960-1972.

  • ARGON: 1962-64

  • LANYARD: 1963.

It’s not clear which images come from which satellites systems.

The dataset contains 837088 images. A handful of different camera setups were used during the program:

unique(sat1$`Camera Resolution`)
## [1] "Vertical Medium" "Stereo Medium"   "Vertical Low"    "Vertical High"  
## [5] "Stereo High"
unique(sat1$`Camera Type`)
## [1] "Vertical"     "Aft"          "Forward"      "Cartographic"

It’s not clear how the resolution of these cameras compares to the later generations, KH-7 and KH-9.

Here is the total “footprint” of the images in this dataset (shapefile supplied by USGS):

Declassified 2

Declassified 2 is the product of a 2002 declassification involving the non-comprehensive declassification of imagery from the following programs:

  • KH-7 (GAMBIT): images taken between 1963 and 1967, the full lifespan of GAMBIT.

  • KH-9 (HEXAGON): images taken from 1973 to 1980, a subset of the operational period of HEXAGON.

It’s not clear whether all of the images from KH-7 were declassified or whether some were withheld. Only a subset of the KH-9 images were declassified.

The dataset contains 46,699 images.

KH-7 was used for higher-resolution surveillance. KH-9 had both a lower-resolution mapping camera and a higher-resolution surveillance camera, but only the mapping images were declassified in this declassification act:

unique(sat2$`Camera Resolution`)
## [1] "2 to 4 feet"   "20 to 30 feet"
unique(sat2$`Camera Type`)
## [1] "KH-7 High Resolution Surveillance" "KH-9 Lower Resolution Mapping"

Here is the total “footprint” of the images in this dataset (shapefile supplied by USGS):

Declassified 3

Declassified 3 is the product of a 2011 declassification involving the non-comprehensive declassification of imagery from KH-9 (HEXAGON), which ran from 1971 to 1984. This includes images from the high-resolution surveillance camera, but the website says that “almost all of the imagery from these cameras were declassified in 2011” implying that some images remain classified.

The dataset contains 531,321 images. Note that the website says that “The process to ingest and generate browse imagery for Declass-3 is ongoing,” and suggests that the HEXAGON program generated over 670,000 scenes, indicating that the dataset which we have access to is missing a substantial chunk of the images from HEXAGON.

Both the terrain mapping and surveillance imagery were included in this declassification:

unique(sat3$`Camera Resolution`)
## [1] "2 to 4 feet"   "20 to 30 feet"
unique(sat3$`Camera Type`)
## [1] "High Resolution Surveillance Camera - Forward"
## [2] "High Resolution Surveillance Camera - Aft"    
## [3] "Lower Resolution Terrain Mapping Camera"

Here is the total “footprint” of the images in this dataset (shapefile supplied by USGS):

Issues with the data

The satellite imagery dataset has a number of limitations.

Inaccurate & Missing Coordinates

It’s not clear that the coordinates listed in the datasets are always accurate. USGS says “We do recommend viewing both the ‘Preview Image’ and ‘Show All Fields’ metadata before submitting your order. Browse viewing is a critical component in the order validation process. The effects of cloud cover and the accuracy of the latitude and longitude coordinates can greatly affect the usability of the data.” Likewise, the description of declassified 1 says “Mathematical calculations based on camera operation and satellite path were used to approximate image coordinates. Since the accuracy of the coordinates varies according to the precision of information used for the derivation, users should inspect the preview image to verify that the area of interest is contained in the selected frame.”

Furthermore, 11,409 rows in the dataset (all from declass 2 and 3) simply have no associated coordinates:

nrow(filter(sat, geometry == "POLYGON ((0 0,0 0,0 0,0 0,0 0))"))
## [1] 11409

Missing images & incomplete data

As mentioned above, declass3 is a work-in-progress; over 100,000 images seem to be missing from it. There are also discrepancies in the number of images advertised for all three datasets on USGS earth explorer and the number that you can actually download, although the result is that you end up with more images than expected, not fewer.

It’s also not clear whether some critical images remain classified. This source indicates that KH-7 images of Isreal are still classified, for instance. The same source also has details on KH-8 or Gambit-3, imagery from which doesn’t seem to have been declassified. There are other satellite surveillance programs from the overlapping time periods like GRAB, POPPY, and QUILL, though I haven’t looked into which of these were image-takers.

No Acquisition Date

One particular set of 14 photos from mission 1205-3 in declass3 has no acquisition date listed.

nrow(filter(sat, is.na(`Acquisition Date`)))
## [1] 14

Appendix B: Counting Captures